A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning
نویسندگان
چکیده
The decision-making by agents in games is commonly based on reinforcement learning. To improve the quality of agents, it is necessary to solve the problems of the time and state space that are required for learning. Such problems can be solved by Macro-Actions, which are defined and executed by a sequence of primitive actions. In this line of research, the learning time is reduced by cutting down the number of policy decisions by agents. Macro-Actions were originally defined as combinations of the same primitive actions. Based on studies that showed the generation of Macro-Actions by learning, Macro-Actions are now thought to consist of diverse kinds of primitive actions. However an enormous amount of learning time and state space are required to generate Macro-Actions. To resolve these issues, we can apply insights from studies on the learning of tasks through Programming by Demonstration (PbD) to generate MacroActions that reduce the learning time and state space. In this paper, we propose a method to define and execute Macro-Actions. Macro-Actions are learned from a human subject via PbD and a policy is learned by reinforcement learning. In an experiment, the proposed method was applied to a car simulation to verify the scalability of the proposed method. Data was collected from the driving control of a human subject, and then the MacroActions that are required for running a car were generated. Furthermore, the policy that is necessary for driving on a track was learned. The acquisition of Macro-Actions by PbD reduced the driving time by about 16% compared to the case in which Macro-Actions were directly defined by a human subject. In addition, the learning time was also reduced by a faster convergence of the optimum policies. Keywords—Reinforcement Learning, Monte Carlo Method, Behavior Generation Model, Programming By Demonstration, Macro-Action, Multi-Step Action
منابع مشابه
Control Behavior of 3D Humanoid Animation Object Using Reinforcement Learning
The ability to learn is a potentially compelling and important quality for interactive 3D human avatars or virtual humans. To that end, we describe a practical approach to real-time learning for 3D virtual humans. Our implementation is grounded in the techniques of reinforcement learning and informed by insights from avatar’s behavior training. It simulates the learning task for characters by e...
متن کاملTeaching Virtual Characters How to Use Body Language
Non-verbal communication, or “body language”, is a critical component in constructing believable virtual characters. Most often, body language is implemented by a set of ad-hoc rules. We propose a new method for authors to specify and refine their character’s body-language responses. Using our method, the author watches the character acting in a situation, and provides simple feedback on-line. ...
متن کاملDescription and Acquirement of Macro-Actions in Reinforcement Learning
Reinforcement learning is a framing of enabling agents to learn from interaction with environments. It has focused generally on Markov decision process (MDP) domains, but a domain may be non-Markovian in the real world. In this paper, we develop a new description of macro-actions for non-Markov decision process (NMDP) domains in reinforcement learning. A macro-action is an action control struct...
متن کاملWp-dyna: Planning and Reinforcement Learning in Well-plannable Environments
Reinforcement learning (RL) involves sequential decision making in uncertain environments. The aim of the decision-making agent is to maximize the benefit of acting in its environment over an extended period of time. Finding an optimal policy in RL may be very slow. To speed up learning, one often used solution is the integration of planning, for example, Sutton’s Dyna algorithm, or various oth...
متن کاملMacro - Actions in Reinforcement Learning : An EmpiricalAnalysisAmy McGovern and Richard
Several researchers have proposed reinforcement learning methods that obtain advantages in learning by using temporally extended actions, or macro-actions, but none has carefully analyzed what these advantages are. In this paper, we separate and analyze two advantages of using macro-actions in reinforcement learning: the eeect on exploratory behavior, independent of learning, and the eeect on t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JIPS
دوره 8 شماره
صفحات -
تاریخ انتشار 2012